Overview

Dataset statistics

Number of variables7
Number of observations122265
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.5 MiB
Average record size in memory56.0 B

Variable types

Categorical3
Numeric3
DateTime1

Alerts

row_id has a high cardinality: 122265 distinct valuesHigh cardinality
county has a high cardinality: 1871 distinct valuesHigh cardinality
state has a high cardinality: 51 distinct valuesHigh cardinality
cfips is highly overall correlated with stateHigh correlation
microbusiness_density is highly overall correlated with activeHigh correlation
active is highly overall correlated with microbusiness_densityHigh correlation
state is highly overall correlated with cfipsHigh correlation
row_id is uniformly distributedUniform
row_id has unique valuesUnique

Reproduction

Analysis started2023-01-08 06:05:05.504744
Analysis finished2023-01-08 06:08:21.430547
Duration3 minutes and 15.93 seconds
Software versionpandas-profiling vv3.6.2
Download configurationconfig.json

Variables

row_id
Categorical

HIGH CARDINALITY  UNIFORM  UNIQUE 

Distinct122265
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size955.3 KiB
1001_2019-08-01
 
1
39099_2022-07-01
 
1
39101_2020-04-01
 
1
39101_2020-03-01
 
1
39101_2020-02-01
 
1
Other values (122260)
122260 

Length

Max length16
Median length16
Mean length15.899841
Min length15

Characters and Unicode

Total characters1943994
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique122265 ?
Unique (%)100.0%

Sample

1st row1001_2019-08-01
2nd row1001_2019-09-01
3rd row1001_2019-10-01
4th row1001_2019-11-01
5th row1001_2019-12-01

Common Values

ValueCountFrequency (%)
1001_2019-08-01 1
 
< 0.1%
39099_2022-07-01 1
 
< 0.1%
39101_2020-04-01 1
 
< 0.1%
39101_2020-03-01 1
 
< 0.1%
39101_2020-02-01 1
 
< 0.1%
39101_2020-01-01 1
 
< 0.1%
39101_2019-12-01 1
 
< 0.1%
39101_2019-11-01 1
 
< 0.1%
39101_2019-10-01 1
 
< 0.1%
39101_2019-09-01 1
 
< 0.1%
Other values (122255) 122255
> 99.9%

Length

2023-01-08T11:38:21.611067image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1001_2019-08-01 1
 
< 0.1%
1001_2020-12-01 1
 
< 0.1%
1001_2019-12-01 1
 
< 0.1%
1001_2020-01-01 1
 
< 0.1%
1001_2020-02-01 1
 
< 0.1%
1001_2020-03-01 1
 
< 0.1%
1001_2020-04-01 1
 
< 0.1%
1001_2020-05-01 1
 
< 0.1%
1001_2020-06-01 1
 
< 0.1%
1001_2020-07-01 1
 
< 0.1%
Other values (122255) 122255
> 99.9%

Most occurring characters

ValueCountFrequency (%)
0 488994
25.2%
1 342018
17.6%
2 336657
17.3%
- 244530
12.6%
_ 122265
 
6.3%
3 78552
 
4.0%
9 73143
 
3.8%
5 67827
 
3.5%
7 58506
 
3.0%
4 54177
 
2.8%
Other values (2) 77325
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1577199
81.1%
Dash Punctuation 244530
 
12.6%
Connector Punctuation 122265
 
6.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 488994
31.0%
1 342018
21.7%
2 336657
21.3%
3 78552
 
5.0%
9 73143
 
4.6%
5 67827
 
4.3%
7 58506
 
3.7%
4 54177
 
3.4%
8 43662
 
2.8%
6 33663
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
- 244530
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 122265
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1943994
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 488994
25.2%
1 342018
17.6%
2 336657
17.3%
- 244530
12.6%
_ 122265
 
6.3%
3 78552
 
4.0%
9 73143
 
3.8%
5 67827
 
3.5%
7 58506
 
3.0%
4 54177
 
2.8%
Other values (2) 77325
 
4.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1943994
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 488994
25.2%
1 342018
17.6%
2 336657
17.3%
- 244530
12.6%
_ 122265
 
6.3%
3 78552
 
4.0%
9 73143
 
3.8%
5 67827
 
3.5%
7 58506
 
3.0%
4 54177
 
2.8%
Other values (2) 77325
 
4.0%

cfips
Real number (ℝ)

Distinct3135
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30376.038
Minimum1001
Maximum56045
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size955.3 KiB
2023-01-08T11:38:21.850425image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1001
5-th percentile5095
Q118177
median29173
Q345077
95-th percentile53065
Maximum56045
Range55044
Interquartile range (IQR)26900

Descriptive statistics

Standard deviation15143.509
Coefficient of variation (CV)0.4985347
Kurtosis-1.0974534
Mean30376.038
Median Absolute Deviation (MAD)12012
Skewness-0.077451731
Sum3.7139262 × 109
Variance2.2932586 × 108
MonotonicityIncreasing
2023-01-08T11:38:22.121699image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1001 39
 
< 0.1%
39133 39
 
< 0.1%
39089 39
 
< 0.1%
39091 39
 
< 0.1%
39093 39
 
< 0.1%
39095 39
 
< 0.1%
39097 39
 
< 0.1%
39099 39
 
< 0.1%
39101 39
 
< 0.1%
39103 39
 
< 0.1%
Other values (3125) 121875
99.7%
ValueCountFrequency (%)
1001 39
< 0.1%
1003 39
< 0.1%
1005 39
< 0.1%
1007 39
< 0.1%
1009 39
< 0.1%
1011 39
< 0.1%
1013 39
< 0.1%
1015 39
< 0.1%
1017 39
< 0.1%
1019 39
< 0.1%
ValueCountFrequency (%)
56045 39
< 0.1%
56043 39
< 0.1%
56041 39
< 0.1%
56039 39
< 0.1%
56037 39
< 0.1%
56035 39
< 0.1%
56033 39
< 0.1%
56031 39
< 0.1%
56029 39
< 0.1%
56027 39
< 0.1%

county
Categorical

Distinct1871
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size955.3 KiB
Washington County
 
1170
Jefferson County
 
975
Franklin County
 
936
Lincoln County
 
897
Jackson County
 
897
Other values (1866)
117390 

Length

Max length33
Median length28
Mean length14.016906
Min length10

Characters and Unicode

Total characters1713777
Distinct characters57
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAutauga County
2nd rowAutauga County
3rd rowAutauga County
4th rowAutauga County
5th rowAutauga County

Common Values

ValueCountFrequency (%)
Washington County 1170
 
1.0%
Jefferson County 975
 
0.8%
Franklin County 936
 
0.8%
Lincoln County 897
 
0.7%
Jackson County 897
 
0.7%
Madison County 741
 
0.6%
Montgomery County 702
 
0.6%
Clay County 702
 
0.6%
Union County 663
 
0.5%
Monroe County 663
 
0.5%
Other values (1861) 113919
93.2%

Length

2023-01-08T11:38:22.382003image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
county 117195
46.2%
parish 2496
 
1.0%
city 1716
 
0.7%
washington 1209
 
0.5%
jefferson 1092
 
0.4%
st 1014
 
0.4%
franklin 1014
 
0.4%
lincoln 936
 
0.4%
jackson 936
 
0.4%
madison 780
 
0.3%
Other values (1857) 125073
49.3%

Most occurring characters

ValueCountFrequency (%)
n 189813
11.1%
o 184587
10.8%
t 157482
 
9.2%
u 139581
 
8.1%
C 133224
 
7.8%
y 132171
 
7.7%
131196
 
7.7%
a 87360
 
5.1%
e 84162
 
4.9%
r 62478
 
3.6%
Other values (47) 411723
24.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1327638
77.5%
Uppercase Letter 253500
 
14.8%
Space Separator 131196
 
7.7%
Other Punctuation 1209
 
0.1%
Dash Punctuation 195
 
< 0.1%
Math Symbol 39
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 189813
14.3%
o 184587
13.9%
t 157482
11.9%
u 139581
10.5%
y 132171
10.0%
a 87360
6.6%
e 84162
6.3%
r 62478
 
4.7%
i 49179
 
3.7%
l 49179
 
3.7%
Other values (16) 191646
14.4%
Uppercase Letter
ValueCountFrequency (%)
C 133224
52.6%
M 12090
 
4.8%
S 11037
 
4.4%
P 10218
 
4.0%
B 10140
 
4.0%
W 8775
 
3.5%
L 8697
 
3.4%
H 7722
 
3.0%
G 6084
 
2.4%
D 5694
 
2.2%
Other values (16) 39819
 
15.7%
Other Punctuation
ValueCountFrequency (%)
. 1053
87.1%
' 156
 
12.9%
Space Separator
ValueCountFrequency (%)
131196
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 195
100.0%
Math Symbol
ValueCountFrequency (%)
± 39
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1581138
92.3%
Common 132639
 
7.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 189813
12.0%
o 184587
11.7%
t 157482
10.0%
u 139581
 
8.8%
C 133224
 
8.4%
y 132171
 
8.4%
a 87360
 
5.5%
e 84162
 
5.3%
r 62478
 
4.0%
i 49179
 
3.1%
Other values (42) 361101
22.8%
Common
ValueCountFrequency (%)
131196
98.9%
. 1053
 
0.8%
- 195
 
0.1%
' 156
 
0.1%
± 39
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1713699
> 99.9%
Latin 1 Sup 78
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 189813
11.1%
o 184587
10.8%
t 157482
 
9.2%
u 139581
 
8.1%
C 133224
 
7.8%
y 132171
 
7.7%
131196
 
7.7%
a 87360
 
5.1%
e 84162
 
4.9%
r 62478
 
3.6%
Other values (45) 411645
24.0%
Latin 1 Sup
ValueCountFrequency (%)
à 39
50.0%
± 39
50.0%

state
Categorical

HIGH CARDINALITY  HIGH CORRELATION 

Distinct51
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size955.3 KiB
Texas
9906 
Georgia
 
6201
Virginia
 
5070
Kentucky
 
4680
Missouri
 
4485
Other values (46)
91923 

Length

Max length20
Median length13
Mean length8.0810207
Min length4

Characters and Unicode

Total characters988026
Distinct characters46
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAlabama
2nd rowAlabama
3rd rowAlabama
4th rowAlabama
5th rowAlabama

Common Values

ValueCountFrequency (%)
Texas 9906
 
8.1%
Georgia 6201
 
5.1%
Virginia 5070
 
4.1%
Kentucky 4680
 
3.8%
Missouri 4485
 
3.7%
Kansas 4095
 
3.3%
Illinois 3978
 
3.3%
North Carolina 3900
 
3.2%
Iowa 3861
 
3.2%
Tennessee 3705
 
3.0%
Other values (41) 72384
59.2%

Length

2023-01-08T11:38:22.610393image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
texas 9906
 
7.1%
virginia 7215
 
5.2%
georgia 6201
 
4.4%
north 5967
 
4.3%
carolina 5694
 
4.1%
new 4914
 
3.5%
kentucky 4680
 
3.3%
dakota 4602
 
3.3%
missouri 4485
 
3.2%
south 4329
 
3.1%
Other values (45) 81900
58.5%

Most occurring characters

ValueCountFrequency (%)
a 134082
13.6%
i 106626
 
10.8%
n 84630
 
8.6%
s 83148
 
8.4%
o 79755
 
8.1%
e 60099
 
6.1%
r 50700
 
5.1%
t 32292
 
3.3%
l 31590
 
3.2%
h 25467
 
2.6%
Other values (36) 299637
30.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 830544
84.1%
Uppercase Letter 139854
 
14.2%
Space Separator 17628
 
1.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 134082
16.1%
i 106626
12.8%
n 84630
10.2%
s 83148
10.0%
o 79755
9.6%
e 60099
7.2%
r 50700
 
6.1%
t 32292
 
3.9%
l 31590
 
3.8%
h 25467
 
3.1%
Other values (14) 142155
17.1%
Uppercase Letter
ValueCountFrequency (%)
M 19890
14.2%
N 15132
10.8%
T 13611
9.7%
I 13338
9.5%
C 10803
 
7.7%
K 8775
 
6.3%
O 7839
 
5.6%
V 7761
 
5.5%
W 7371
 
5.3%
A 7176
 
5.1%
Other values (11) 28158
20.1%
Space Separator
ValueCountFrequency (%)
17628
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 970398
98.2%
Common 17628
 
1.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 134082
13.8%
i 106626
 
11.0%
n 84630
 
8.7%
s 83148
 
8.6%
o 79755
 
8.2%
e 60099
 
6.2%
r 50700
 
5.2%
t 32292
 
3.3%
l 31590
 
3.3%
h 25467
 
2.6%
Other values (35) 282009
29.1%
Common
ValueCountFrequency (%)
17628
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 988026
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 134082
13.6%
i 106626
 
10.8%
n 84630
 
8.6%
s 83148
 
8.4%
o 79755
 
8.1%
e 60099
 
6.1%
r 50700
 
5.1%
t 32292
 
3.3%
l 31590
 
3.2%
h 25467
 
2.6%
Other values (36) 299637
30.3%
Distinct39
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size955.3 KiB
Minimum2019-01-08 00:00:00
Maximum2022-01-10 00:00:00
2023-01-08T11:38:22.826813image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-01-08T11:38:23.048223image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=39)

microbusiness_density
Real number (ℝ)

Distinct97122
Distinct (%)79.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.8176706
Minimum0
Maximum284.34003
Zeros26
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size955.3 KiB
2023-01-08T11:38:23.270633image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.81756973
Q11.6393442
median2.5865433
Q34.5192308
95-th percentile10.55302
Maximum284.34003
Range284.34003
Interquartile range (IQR)2.8798866

Descriptive statistics

Standard deviation4.9910868
Coefficient of variation (CV)1.3073645
Kurtosis556.68529
Mean3.8176706
Median Absolute Deviation (MAD)1.1668079
Skewness15.970181
Sum466767.49
Variance24.910947
MonotonicityNot monotonic
2023-01-08T11:38:23.488048image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 26
 
< 0.1%
1.8518518 20
 
< 0.1%
2.1276596 19
 
< 0.1%
1.6393442 19
 
< 0.1%
2.9940119 18
 
< 0.1%
0.97719872 18
 
< 0.1%
1.25 18
 
< 0.1%
0.93457943 17
 
< 0.1%
1.4925373 17
 
< 0.1%
1.369863 16
 
< 0.1%
Other values (97112) 122077
99.8%
ValueCountFrequency (%)
0 26
< 0.1%
0.063836582 12
< 0.1%
0.064516127 12
< 0.1%
0.066711143 1
 
< 0.1%
0.069662139 5
 
< 0.1%
0.08605852 1
 
< 0.1%
0.088652484 7
 
< 0.1%
0.10006671 9
 
< 0.1%
0.14338386 4
 
< 0.1%
0.15362556 6
 
< 0.1%
ValueCountFrequency (%)
284.34003 1
< 0.1%
277.53598 1
< 0.1%
227.75665 1
< 0.1%
224.53825 1
< 0.1%
217.58711 1
< 0.1%
217.25502 1
< 0.1%
217.1413 1
< 0.1%
210.0473 1
< 0.1%
208.22719 1
< 0.1%
206.80765 1
< 0.1%

active
Real number (ℝ)

Distinct19193
Distinct (%)15.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6442.8582
Minimum0
Maximum1167744
Zeros26
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size955.3 KiB
2023-01-08T11:38:23.754334image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile35
Q1145
median488
Q32124
95-th percentile26306.2
Maximum1167744
Range1167744
Interquartile range (IQR)1979

Descriptive statistics

Standard deviation33040.012
Coefficient of variation (CV)5.1281607
Kurtosis471.82158
Mean6442.8582
Median Absolute Deviation (MAD)419
Skewness17.572118
Sum7.8773606 × 108
Variance1.0916424 × 109
MonotonicityNot monotonic
2023-01-08T11:38:24.030594image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
33 327
 
0.3%
36 319
 
0.3%
69 306
 
0.3%
32 305
 
0.2%
39 296
 
0.2%
63 290
 
0.2%
34 290
 
0.2%
37 289
 
0.2%
68 277
 
0.2%
76 276
 
0.2%
Other values (19183) 119290
97.6%
ValueCountFrequency (%)
0 26
 
< 0.1%
1 76
0.1%
2 120
0.1%
3 94
0.1%
4 62
 
0.1%
5 164
0.1%
6 178
0.1%
7 119
0.1%
8 130
0.1%
9 121
0.1%
ValueCountFrequency (%)
1167744 1
< 0.1%
1160868 1
< 0.1%
1153292 1
< 0.1%
1152842 1
< 0.1%
1151836 1
< 0.1%
1150017 1
< 0.1%
1143527 1
< 0.1%
1142598 1
< 0.1%
1142034 1
< 0.1%
1141159 1
< 0.1%

Interactions

2023-01-08T11:37:24.825898image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-01-08T11:35:08.925706image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-01-08T11:36:32.540699image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-01-08T11:38:19.089806image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-01-08T11:36:00.778154image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-01-08T11:37:24.162670image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-01-08T11:38:19.367064image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-01-08T11:36:16.782747image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-01-08T11:37:24.410010image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2023-01-08T11:38:24.251006image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
cfipsmicrobusiness_densityactivestate
cfips1.0000.1270.0730.987
microbusiness_density0.1271.0000.7830.086
active0.0730.7831.0000.146
state0.9870.0860.1461.000

Missing values

2023-01-08T11:38:20.630687image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-01-08T11:38:21.081479image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

row_idcfipscountystatefirst_day_of_monthmicrobusiness_densityactive
01001_2019-08-011001Autauga CountyAlabama2019-01-083.0076821249
11001_2019-09-011001Autauga CountyAlabama2019-01-092.8848701198
21001_2019-10-011001Autauga CountyAlabama2019-01-103.0558431269
31001_2019-11-011001Autauga CountyAlabama2019-01-112.9932331243
41001_2019-12-011001Autauga CountyAlabama2019-01-122.9932331243
51001_2020-01-011001Autauga CountyAlabama2020-01-012.9690901242
61001_2020-02-011001Autauga CountyAlabama2020-01-022.9093261217
71001_2020-03-011001Autauga CountyAlabama2020-01-032.9332311227
81001_2020-04-011001Autauga CountyAlabama2020-01-043.0001671255
91001_2020-05-011001Autauga CountyAlabama2020-01-053.0049481257
row_idcfipscountystatefirst_day_of_monthmicrobusiness_densityactive
12225556045_2022-01-0156045Weston CountyWyoming2022-01-011.74968898
12225656045_2022-02-0156045Weston CountyWyoming2022-01-021.74968898
12225756045_2022-03-0156045Weston CountyWyoming2022-01-031.76754299
12225856045_2022-04-0156045Weston CountyWyoming2022-01-041.76754299
12225956045_2022-05-0156045Weston CountyWyoming2022-01-051.803249101
12226056045_2022-06-0156045Weston CountyWyoming2022-01-061.803249101
12226156045_2022-07-0156045Weston CountyWyoming2022-01-071.803249101
12226256045_2022-08-0156045Weston CountyWyoming2022-01-081.785395100
12226356045_2022-09-0156045Weston CountyWyoming2022-01-091.785395100
12226456045_2022-10-0156045Weston CountyWyoming2022-01-101.785395100